The ATIS Spoken Language Systems Pilot Corpus
نویسندگان
چکیده
Speech research has made tremendous progress in the past using the following paradigm: de ne the research problem, collect a corpus to objectively measure progress, and solve the research problem. Natural language research, on the other hand, has typically progressed without the bene t of any corpus of data with which to test research hypotheses. We describe the Air Travel Information System (ATIS) pilot corpus, a corpus designed to measure progress in Spoken Language Systems that include both a speech and natural language component. This pilot marks the rst full-scale attempt to collect such a corpus and provides guidelines for future e orts.
منابع مشابه
Benchmark Tests For The Darpa Spoken Language Program
This paper documents benchmark tests implemented within the DARPA Spoken Language Program during the period November, 1992 January, 1993. Tests were conducted using the Wall Street Journal-based Continuous Speech Recognition (WSJ-CSR) corpus and the Air Travel Information System (ATIS) corpus collected by the Multi-site ATIS Data COllection Working (MADCOW) Group. The WSJ-CSR tests consist of t...
متن کاملNIST-DARPA Interagency Agreement: Spoken Language Program
1. To coordinate the design, development and distribution of speech and natural language corpora for the DARPA Spoken Language research community. 2. To design, coordinate implementation, and analyze results, of performance assessment "benchmark tests" for DARPA's speech recognition and spoken language understanding systems. 1. Completed production of the six-CD-ROM-set for ATIS0, and made this...
متن کاملDARPA February 1992 Pilot Corpus CSR "Dry Run" Benchmark Test Results
Continuous speech recognition research activities within the DARPA Spoken Language community have, within the past several years, been focussed on the Resource Management (RM) and Air Travel Information System (ATIS) corpora. Within the past year, plans have been developed for a large, multi-component "general-purpose English, large vocabulary, natural language, high perplexity corpus" known as...
متن کاملMulti-Site Data Collection for a Spoken Language Corpus
This paper describes a recently collected spoken language corpus for the ATIS (Air Travel Information System) domain. This data collection effort has been co-ordinated by MADCOW (Multi-site ATIS Data COllection Working group). We summarize the motivation for this effort, the goals, the implementation of a multi-site data collection paradigm, and the accomplishments of MADCOW in monitoring the c...
متن کاملExpanding the Scope of the ATIS Task: The ATIS-3 Corpus
The Air Travel Information System (ATIS) domain serves as the common evaluation task for ARPA"spoken language system developers. 1 To support this task, the Multi-Site ATIS Data COllection Working group (MADCOW) coordinates data collection activities. This paper describes recent MADCOW activities. In particular, this paper describes the migration of the ATIS task to a richer relational database...
متن کامل